Overview

Dataset statistics

Number of variables10
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory78.2 KiB
Average record size in memory80.1 B

Variable types

Categorical2
Numeric8

Warnings

O3 has constant value "0.0" Constant
Time has a high cardinality: 999 distinct values High cardinality
Time is uniformly distributed Uniform
PM25 has 903 (90.3%) zeros Zeros
VOC has 104 (10.4%) zeros Zeros

Reproduction

Analysis started2021-04-01 11:03:50.300361
Analysis finished2021-04-01 11:04:04.390771
Duration14.09 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

Time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct999
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
09:34:53
 
2
10:45:52
 
1
09:20:55
 
1
14:15:03
 
1
11:04:03
 
1
Other values (994)
994 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters8000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique998 ?
Unique (%)99.8%

Sample

1st row16:33:01
2nd row16:32:09
3rd row16:31:16
4th row16:30:21
5th row16:29:29
ValueCountFrequency (%)
09:34:532
 
0.2%
10:45:521
 
0.1%
09:20:551
 
0.1%
14:15:031
 
0.1%
11:04:031
 
0.1%
10:42:451
 
0.1%
11:10:081
 
0.1%
10:07:371
 
0.1%
11:35:321
 
0.1%
16:24:151
 
0.1%
Other values (989)989
98.9%
Histogram of lengths of the category
ValueCountFrequency (%)
09:34:532
 
0.2%
10:45:521
 
0.1%
09:20:551
 
0.1%
14:15:031
 
0.1%
11:04:031
 
0.1%
10:42:451
 
0.1%
11:10:081
 
0.1%
10:07:371
 
0.1%
11:35:321
 
0.1%
16:24:151
 
0.1%
Other values (989)989
98.9%

Most occurring characters

ValueCountFrequency (%)
:2000
25.0%
11390
17.4%
0900
11.2%
5672
 
8.4%
2619
 
7.7%
4612
 
7.6%
3602
 
7.5%
6317
 
4.0%
7300
 
3.8%
8295
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6000
75.0%
Other Punctuation2000
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
11390
23.2%
0900
15.0%
5672
11.2%
2619
10.3%
4612
10.2%
3602
10.0%
6317
 
5.3%
7300
 
5.0%
8295
 
4.9%
9293
 
4.9%
ValueCountFrequency (%)
:2000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8000
100.0%

Most frequent character per script

ValueCountFrequency (%)
:2000
25.0%
11390
17.4%
0900
11.2%
5672
 
8.4%
2619
 
7.7%
4612
 
7.6%
3602
 
7.5%
6317
 
4.0%
7300
 
3.8%
8295
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII8000
100.0%

Most frequent character per block

ValueCountFrequency (%)
:2000
25.0%
11390
17.4%
0900
11.2%
5672
 
8.4%
2619
 
7.7%
4612
 
7.6%
3602
 
7.5%
6317
 
4.0%
7300
 
3.8%
8295
 
3.7%

Temp
Real number (ℝ≥0)

Distinct439
Distinct (%)43.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.89512
Minimum29.31
Maximum42.94
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum29.31
5-th percentile33.7895
Q134.66
median35.94
Q339.67
95-th percentile40.261
Maximum42.94
Range13.63
Interquartile range (IQR)5.01

Descriptive statistics

Standard deviation2.506371258
Coefficient of variation (CV)0.06793232433
Kurtosis-1.313489231
Mean36.89512
Median Absolute Deviation (MAD)2.015
Skewness0.06617196177
Sum36895.12
Variance6.281896882
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.6917
 
1.7%
39.6716
 
1.6%
39.7413
 
1.3%
39.7611
 
1.1%
39.711
 
1.1%
39.7711
 
1.1%
39.7210
 
1.0%
34.319
 
0.9%
39.659
 
0.9%
39.69
 
0.9%
Other values (429)884
88.4%
ValueCountFrequency (%)
29.311
0.1%
29.731
0.1%
30.281
0.1%
30.611
0.1%
31.321
0.1%
ValueCountFrequency (%)
42.941
0.1%
42.891
0.1%
42.861
0.1%
42.41
0.1%
41.841
0.1%

PM25
Real number (ℝ≥0)

ZEROS

Distinct17
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.01845
Minimum0
Maximum0.22
Zeros903
Zeros (%)90.3%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.21
Maximum0.22
Range0.22
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.05833438437
Coefficient of variation (CV)3.161755251
Kurtosis6.572066325
Mean0.01845
Median Absolute Deviation (MAD)0
Skewness2.910259912
Sum18.45
Variance0.0034029004
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
0903
90.3%
0.2162
 
6.2%
0.210
 
1.0%
0.226
 
0.6%
0.194
 
0.4%
0.072
 
0.2%
0.172
 
0.2%
0.022
 
0.2%
0.011
 
0.1%
0.161
 
0.1%
Other values (7)7
 
0.7%
ValueCountFrequency (%)
0903
90.3%
0.011
 
0.1%
0.022
 
0.2%
0.031
 
0.1%
0.051
 
0.1%
ValueCountFrequency (%)
0.226
 
0.6%
0.2162
6.2%
0.210
 
1.0%
0.194
 
0.4%
0.181
 
0.1%

lux
Real number (ℝ≥0)

Distinct330
Distinct (%)33.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.61273
Minimum6.76
Maximum1859.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum6.76
5-th percentile10.78
Q116.0175
median81.28
Q384.39
95-th percentile92.04
Maximum1859.2
Range1852.44
Interquartile range (IQR)68.3725

Descriptive statistics

Standard deviation116.6429709
Coefficient of variation (CV)1.833641959
Kurtosis205.285406
Mean63.61273
Median Absolute Deviation (MAD)11.32
Skewness13.73549663
Sum63612.73
Variance13605.58266
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.8657
 
5.7%
10.9441
 
4.1%
10.7839
 
3.9%
84.3621
 
2.1%
85.1618
 
1.8%
84.5216
 
1.6%
81.4415
 
1.5%
85.3215
 
1.5%
81.614
 
1.4%
82.8814
 
1.4%
Other values (320)750
75.0%
ValueCountFrequency (%)
6.761
 
0.1%
6.923
0.3%
71
 
0.1%
7.161
 
0.1%
7.244
0.4%
ValueCountFrequency (%)
1859.21
0.1%
1820.161
0.1%
1805.441
0.1%
1803.521
0.1%
259.61
0.1%

VOC
Real number (ℝ≥0)

ZEROS

Distinct466
Distinct (%)46.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2580.952
Minimum0
Maximum60000
Zeros104
Zeros (%)10.4%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q127.75
median170.5
Q3337.75
95-th percentile10811.2
Maximum60000
Range60000
Interquartile range (IQR)310

Descriptive statistics

Standard deviation10076.37519
Coefficient of variation (CV)3.904131185
Kurtosis25.63271084
Mean2580.952
Median Absolute Deviation (MAD)146
Skewness5.102247517
Sum2580952
Variance101533337
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0104
 
10.4%
6000026
 
2.6%
1211
 
1.1%
911
 
1.1%
3210
 
1.0%
2348
 
0.8%
88
 
0.8%
227
 
0.7%
2417
 
0.7%
257
 
0.7%
Other values (456)801
80.1%
ValueCountFrequency (%)
0104
10.4%
16
 
0.6%
23
 
0.3%
34
 
0.4%
44
 
0.4%
ValueCountFrequency (%)
6000026
2.6%
492641
 
0.1%
444541
 
0.1%
326371
 
0.1%
320441
 
0.1%

CO
Real number (ℝ≥0)

Distinct273
Distinct (%)27.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.88559
Minimum3.03
Maximum61.49
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum3.03
5-th percentile3.38
Q13.54
median3.69
Q33.9925
95-th percentile42.7715
Maximum61.49
Range58.46
Interquartile range (IQR)0.4525

Descriptive statistics

Standard deviation11.85801595
Coefficient of variation (CV)1.503757607
Kurtosis6.630497451
Mean7.88559
Median Absolute Deviation (MAD)0.18
Skewness2.833060716
Sum7885.59
Variance140.6125422
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.5761
 
6.1%
3.6951
 
5.1%
3.730
 
3.0%
3.3829
 
2.9%
3.3924
 
2.4%
3.7223
 
2.3%
3.5623
 
2.3%
3.5223
 
2.3%
3.4723
 
2.3%
3.6822
 
2.2%
Other values (263)691
69.1%
ValueCountFrequency (%)
3.032
 
0.2%
3.3310
1.0%
3.3412
1.2%
3.354
 
0.4%
3.361
 
0.1%
ValueCountFrequency (%)
61.491
0.1%
55.451
0.1%
54.311
0.1%
53.891
0.1%
53.851
0.1%

CO2
Real number (ℝ≥0)

Distinct259
Distinct (%)25.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1847.559
Minimum400
Maximum57330
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum400
5-th percentile400
Q1400
median400
Q3530
95-th percentile1149.15
Maximum57330
Range56930
Interquartile range (IQR)130

Descriptive statistics

Standard deviation8522.870237
Coefficient of variation (CV)4.613043609
Kurtosis38.21506597
Mean1847.559
Median Absolute Deviation (MAD)0
Skewness6.319841758
Sum1847559
Variance72639317.07
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400622
62.2%
5733021
 
2.1%
4675
 
0.5%
4354
 
0.4%
4844
 
0.4%
4034
 
0.4%
5364
 
0.4%
4314
 
0.4%
5093
 
0.3%
5303
 
0.3%
Other values (249)326
32.6%
ValueCountFrequency (%)
400622
62.2%
4011
 
0.1%
4034
 
0.4%
4041
 
0.1%
4051
 
0.1%
ValueCountFrequency (%)
5733021
2.1%
565541
 
0.1%
550631
 
0.1%
136821
 
0.1%
97731
 
0.1%

O3
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
0.0
1000 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3000
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.01000
100.0%
Histogram of lengths of the category
ValueCountFrequency (%)
0.01000
100.0%

Most occurring characters

ValueCountFrequency (%)
02000
66.7%
.1000
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2000
66.7%
Other Punctuation1000
33.3%

Most frequent character per category

ValueCountFrequency (%)
02000
100.0%
ValueCountFrequency (%)
.1000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3000
100.0%

Most frequent character per script

ValueCountFrequency (%)
02000
66.7%
.1000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3000
100.0%

Most frequent character per block

ValueCountFrequency (%)
02000
66.7%
.1000
33.3%

RH
Real number (ℝ≥0)

Distinct717
Distinct (%)71.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.07621
Minimum14.21
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum14.21
5-th percentile15.5895
Q119.03
median24.075
Q327.225
95-th percentile56.3045
Maximum100
Range85.79
Interquartile range (IQR)8.195

Descriptive statistics

Standard deviation12.77487643
Coefficient of variation (CV)0.489905413
Kurtosis10.11149938
Mean26.07621
Median Absolute Deviation (MAD)4.305
Skewness2.932377073
Sum26076.21
Variance163.1974678
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19.178
 
0.8%
19.257
 
0.7%
24.747
 
0.7%
19.125
 
0.5%
24.235
 
0.5%
14.264
 
0.4%
25.064
 
0.4%
24.94
 
0.4%
19.784
 
0.4%
23.814
 
0.4%
Other values (707)948
94.8%
ValueCountFrequency (%)
14.211
 
0.1%
14.242
0.2%
14.251
 
0.1%
14.264
0.4%
14.361
 
0.1%
ValueCountFrequency (%)
1002
0.2%
98.81
0.1%
97.941
0.1%
97.811
0.1%
95.441
0.1%

Pres
Real number (ℝ≥0)

Distinct380
Distinct (%)38.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean978.97492
Minimum976.58
Maximum981.11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum976.58
5-th percentile977.0995
Q1977.91
median979.015
Q3979.98
95-th percentile980.95
Maximum981.11
Range4.53
Interquartile range (IQR)2.07

Descriptive statistics

Standard deviation1.212822135
Coefficient of variation (CV)0.001238869465
Kurtosis-1.075901529
Mean978.97492
Median Absolute Deviation (MAD)0.995
Skewness0.04759768411
Sum978974.92
Variance1.470937531
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
98010
 
1.0%
979.289
 
0.9%
979.999
 
0.9%
979.318
 
0.8%
977.777
 
0.7%
980.947
 
0.7%
977.77
 
0.7%
979.327
 
0.7%
979.987
 
0.7%
977.667
 
0.7%
Other values (370)922
92.2%
ValueCountFrequency (%)
976.582
0.2%
976.591
0.1%
976.61
0.1%
976.611
0.1%
976.621
0.1%
ValueCountFrequency (%)
981.111
0.1%
981.11
0.1%
981.091
0.1%
981.081
0.1%
981.071
0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

TimeTempPM25luxVOCCOCO2O3RHPres
016:33:0135.890.081.767.03.47400.00.024.08978.20
116:32:0935.880.081.901.03.44400.00.024.07978.22
216:31:1635.960.080.9424.03.47400.00.024.04978.30
316:30:2135.910.081.6019.03.47400.00.024.25978.34
416:29:2935.880.081.440.03.47400.00.024.02978.28
516:28:3835.940.081.1227.03.47400.00.023.90978.30
616:26:5435.910.081.126.03.47400.00.023.85978.23
716:25:0936.000.081.1219.03.48400.00.023.73978.27
816:24:1536.020.080.4833.03.48400.00.023.82978.30
916:23:2336.020.080.3236.03.48400.00.023.81978.28

Last rows

TimeTempPM25luxVOCCOCO2O3RHPres
99005:57:1939.840.010.7835.03.71400.00.021.20976.67
99105:56:2839.850.010.8632.03.70400.00.021.22976.67
99205:55:3639.840.010.7843.03.69400.00.021.29976.59
99305:54:4539.820.010.7831.03.69400.00.021.30976.60
99405:53:5539.820.010.8621.03.69400.00.021.29976.63
99505:52:0839.820.010.7831.03.69400.00.021.36976.69
99605:51:1739.800.010.8627.03.69400.00.021.37976.65
99705:50:2539.810.010.8619.03.69400.00.021.36976.61
99805:49:3539.810.010.8622.03.69400.00.021.37976.58
99905:48:4339.820.010.7815.03.69400.00.021.38976.58